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INTRODUCTION 



In 1982 Zel'dovich suggested that the statistics derived from 
a percolation analysis of the density distribution might 
be useful in characterizing the topology of the distribu- 
tion. Soon after that Shandarin (1983) and Shandarin & 
Zel'dovich (1983) explored the possibility that the percola- 
tion properties of the galaxy distribution might provide a 
useful measure of the topology of the observed large-scale 
structure and act as a method for discriminating between 
various cosmological models. Einasto, et al. (1984) applied 
percolation analysis to the CfA I catalog. Their findings indi- 
cated that the large-scale distribution of galaxies was consis- 
tent with a network- like structure. Bhavsar & Barrow (1983) 
applied the percolation method to theoretical studies of N- 
body models with power law initial conditions. In a SI = 1 
universe they found that the n = — 1 case agreed much bet- 
ter with observations than the n = case. Additional work 
which centered on the CDM spectrum by Melott & Shan- 
darin (1983) and Davis, et al. (1985) demonstrated that 
the CDM model also has a connected, network-like struc- 
ture as opposed to a clumpy distribution. Dekel & West 
(1985) pointed out that the percolation method would de- 
pend strongly on the mean density of the sample, which 
would make the method difficult to use for sparse datasets. 
Recent work by Yess & Shandarin (1995) has demonstrated 
that a percolation analysis of a continuous density field on 
a lattice is able to provide robust statistical measures of the 
underlying distribution which do not suffer from the earlier 
criticisms of Dekel & West (1985). 

In their search for an objective method for the iden- 
tification of filaments in observational datasets Barrow, 
Bhavsar & Sonoda (1985) introduced the Minimal Spanning 
Tree (MST) into the cosmological literature. The MST is a 
graph theoretical construct which has been used to quantify 
patterns in datasets (Zahn 1971). Barrow, et al (1985) devel- 



In this work we demonstrate the ability of the Minimal Spanning Tree to duplicate the 
information contained within a percolation analysis for a point dataset. We show how to 
construct the percolation properties from the Minimal Spanning Tree, finding roughly 
an order of magnitude improvement in the computer time required. We apply these 
statistics to Particle-Mesh simulations of large-scale structure formation. We consider 
purely scale-free Gaussian initial conditions (P(k) oc k n , with n = —2, —1,0 & + 1) in 
a critical density universe. We find in general the mass of the percolating cluster is a 
much better quantity by which to judge the onset of percolation than the length of the 
percolating cluster. 

oped several statistics based upon the MST from which they 
were able to differentiate between a Poisson distribution of 
points and several observational datasets. The introduction 
of a bootstrap-based method, referred to as "shuffling", al- 
lowed Bhavsar & Ling (1988) to ascertain the existence of 
the filaments in the CfA survey as real objects and not vi- 
sual artifacts. Recently Krzewina & Saslaw (1995) have in- 
troduced several additional statistics based upon the MST 
which they use to compare a subset of the Southern Sky 
Redshift Catalog (SSRC) to an A-body simulation and a 
Poisson distribution. 



It is possible to construct the MST for any distribu- 
tion of points in space (Gower & Ross 1969; Abraham 1962; 
Zahn 1971). The MST uniquely connects a set of N points 
(referred to as "nodes") with N — 1 lines (referred to as 
"edges" ) in such a way as to minimize the sum of the N — 1 
edges. Consequently closed paths are excluded. This prop- 
erty has been exploited in the past as a way to objectively 
identify filamentary features(Bhavsar & Ling 1988). The 
skeletal pattern defined by the MST can then be used to 
define a number of objective statistics (Barrow, et al. 1985; 
Krzewina & Saslaw 1995) which describe the clustering of 
the data points. 

In this work we first demonstrate that for a point data 
set the MST contains all of the information which is con- 
tained within a percolation analysis for that dataset. We 
then demonstrate the relative robustness of various percola- 
tion based statistical measures of the clustering for a Pois- 
son dataset. We should stress that rather than emphasizing a 
single number, such as the percolation threshold, we base our 
analysis on curves derived from the percolation analysis. We 
work with point datasets as the original percolation studies 
did. Thus we use the simulations "as they are" and the tech- 
niques can be applied directly to the positional data from 
galaxy catalogs. This avoids problems with boundary condi- 
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tions at the edge of the sample, and determining a density 
field from observational data. The time efficiency obtained 
using the MST to investigate the percolation properties has 
encouraged us to apply the statistics to a series of large N- 
body simulations. These studies should, we hope, pave the 
way for the eventual analysis for data from the large redshift 
surveys currently underway. 



2 PERCOLATION AS A SUBSET OF THE MST 

To build the MST we use Prim's algorithm (1957). The sim- 
plest algorithm to construct explicitly the MST of a graph, 
r, first picks an arbitrary node of T and then adds the con- 
nected edge of smallest length. This edge and the two nodes 
at its ends form the partial tree, LTi . The fcth partial tree, life, 
is formed by adding to ITfe_i the shortest edge connecting 
IIfe_i to any nodes of Y not already in IIfc_i. If T contains 
n nodes then IIfc_i is the required MST. Therefore, there 
is clearly small-scale information in the tree because of the 
way in which it is built, but the MST also contains large- 
scale information because the sum of all the edge lengths is 
a minimum. Once an MST is constructed, separation is the 
operation of removing all edges whose length exceeds some 
cutoff. 

The percolation method we use was discussed in detail 
in Bhavsar & Barrow (1983). The method consists of enclos- 
ing individual data points by a sphere of radius R centered 
on the data point. All spheres which intersect form a clus- 
ter. Typically a distribution of points and their enclosing 
sphere's is charaterized by some critical value of R at which 
the length of the longest single connected chain of linked 
spheres grows to of order the size of the system. If this oc- 
curs then the system is said to percolate (Hammersley & 
Welsh 1980). 

Now consider the following short thought experiment. 
Assume that the data set has just percolated, so that the 
radius of the spheres surrounding each data point is given 
by percolation threshold lperc- The distance between the 
two most spatially separated points in any cluster will be 
2 x lperc- Therefore, if the MST for the same dataset is sepa- 
rated using a separation length of 2 x l per c, subtrees will be 
identified which are separated by at least 2 x l per c ■ As a con- 
sequence if we build the MST and begin separating the MST 
we should find that the linear extent of the largest sub-tree 
should exhibit exactly the same behavior as the longest per- 
colating cluster determined by a percolation analysis. In fact 
carrying through the thought experiment for a series of edge 
lengths, we conjecture that separating the MST at every suc- 
cessive edge length starting from the largest to the smallest 
edge length, we recreate the entire percolation analysis at ev- 
ery possible sphere radius. Since this is accomplished by just 
one construction of the MST and subsequent separating, the 
saving in computational time is enormous. Our conjecture 
has been verified by numerical experiments which follow. 

The growth of the percolating cluster as a function of 
the separating length and also the sphere radius is shown 
and compared in figure 1 for a Poisson distribution of 32 3 
particles. This plot shows only one such dataset. We have 
tested this method using many random realizations and con- 
sistently find the same result. To make the comparison be- 
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Figure 1. Percolation of a Poisson distribution using the perco- 
lation code of Bhavsar & Barrow (1983) in open boxes and the 
MST based algorithm presented here in filled circles. The dimen- 
sionless separation length is defined as Z/n -1 / 3 , where n is the 
particle density. 

tween the two curves more qualitative we compute the L\ 
error which we define as 

N 

t 1 \ ^ I ^percolation jMST i / -i \ 

hl - Tj / Ah ~ '» > \ L ) 

where iP ercolaUon ; s the length of the percolating cluster de- 
termined using the percolation code, and l^ ST is the length 
of the longest cluster using the MST/Separation method 
proposed here. For figure 1 we find L\ = 1.2 x 10 , clearly 
at the round-off level. This result is typical for the method. 

The percolation method scales as 0(N 2 ) for each radius 
R. So to identify the percolation threshold requires a signifi- 
cant amount of computer time. Though building the MST is 
also a 0(N 2 ) algorithm the separation process requires sig- 
nificantly fewer operations. As a consequence, percolation 
analysis required roughly 46.5 CPU hours to produce the 
percolation graph in figure 1, whereas the MST/separation 
method required only 4.8 CPU hours on a Silicon Graphics 
Indigo2 to produce the identical plot (also shown in figure 1)! 
A savings of roughly an order of magnitude in runtime. This 
saving can be crucial depending on the size of the dataset. 



3 PERCOLATION STATISTICS FOR A POINT 
DATASET 

In the past only the linear extent of the percolating cluster 
has been considered a primary statistic (Bhavsar & Bar- 
row 1983). In recent years Shandarin and his collaborators 
(Klypin & Shandarin 1993; Yess & Shandarin 1995) have ex- 
tended the percolation method to continuous density fields 
on a lattice and demonstrated the robustness of such meth- 
ods for studying the large-scale distribution of mass. 
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Figure 2. Robustness test for the linear extent of the percolating cluster for a Poisson distribution of particles. The upper left plot is the 
entire 64 3 dataset. The upper right is a 32 3 subset, the lower right is a 16 3 subset, and the lower left is a 8 3 subset. The dimensionless 
neighborhood radius is defined as i/n -1 / 3 , where n is the particle density. 
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Figure 3. Robustness test for the total mass of the percolating cluster for a Poisson distribution of particles. The upper left plot is the 
entire 64 3 dataset. The upper right is a 32 3 subset, the lower right is a 16 3 subset, and the lower left is a 8 3 subset. The dimensionless 
neighborhood radius is defined as (/n^ 1 / 3 , where n is the particle density. 



Here we wish to present a new set of statistics based 
upon percolation using the MST based algorithm for point 
datasets. 

The first statistic we present for comparison is the usual 
linear extent of the percolating cluster as a function of the 
neighborhood radius R, the radius of the spheres surround- 
ing each point. The second is the mass of the percolating 
cluster normalized by the total mass in the simulation as a 
function of the neighborhood radius. To test the robustness 
of each of these statistics we generate 10 Poisson distribu- 
tions varying the number of particles in the box. Figure 2 
shows the linear extent of the percolating cluster for four 
particle densities. The first is 64 3 particles in a box of size 
64 3 , the second is a 32 3 subset of the original 64 3 particles 
in the same volume, the third is a 16 3 subset and the final 
is a 8 3 subset of the original 64 3 particles. 
Figure 3 shows the mass of the percolating cluster as a frac- 
tion of the total mass in the simulation for the same four 
subsets of particles. Each point is the average over the ten 
realizations, and the error bars represent the la deviations 
from the averages. 

Interestingly the mass of the percolating cluster appears 
to be a more robust indicator of the onset of percolation than 
the linear extent of the cluster. This isn't unexpected. As the 
particle density is reduced shot noise due to undersampling 
can have a much more serious impact on the length of the 
cluster than its mass. For instance, by removing one par- 
ticle its possible that the length of the percolating cluster 
could change dramatically, but it is unlikely that removing 
one particle will have much of an effect on the total mass 
of the cluster. Further, the mass curve (Figure 3) demon- 
strates that it is a much more robust measure of the per- 
colation properties of the dataset. The curves allow one to 
accurately estimate the percolation length for as little as 
l/64th of the original particle density, which corresponds 
to the 16 3 subset. Even for the 8 3 subset where there is a 
factor of 512 fewer particles the percolation length can be 
estimated to within 20% or so. Based upon the linear extent 
of the cluster both of the 8 3 subset and the 16 3 subset are 
relatively worthless in estimating the percolation threshold. 



4 N-BODY METHODS 

The Particle-Mesh (PM) code used to generate the simu- 
lations used in this work has been described in detail by 
Melott (1986). The code is a standard PM code, except that 
it uses a staggered grid to obtain slightly better force resolu- 
tion (Melott et al. 1988). The simulations use 128 3 particles 
on a comoving 128 3 mesh. For the percolation studies here 
we use a 32 3 subset of those particles. We ran simulations 
for four different power law initial spectra, n = 1, 0, —1, —2, 
all for a f2 = 1 universe. Ten realizations of each of the above 
four spectra were performed. These realizations were stud- 
ied at the nonlinear wavenumbers k n i = 32, 16, 8, 4 and the 
initial conditions; k ni is defined by 

a 2 = a 2 I P(k)d 3 k = 1, (2) 
Jo 

where P(k) is the initial power spectrum of the density fluc- 
tuations, and a is the cosmic expansion factor. 
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Table 1. Significance levels computed between the 4 ./V-body 
models for the length of the percolating cluster statistic. The 
dimensionless neighborhood radius is defined as i/n -1 / 3 , where 
n is the particle density. 



Spectral Index -2-1 1 

-2 1.0 3.03(10)~ 3 1.31(10)~ 2 1.78(10) 

-1 - 1.0 2.75(10)~ 7 2.75(10) 

- - 1.0 0.99 

1 - - - 1.0 

Table 2. Significance levels computed between the 4 iV-body 
models for the mass of the percolating cluster statistic. 

Spectral Index -2-1 1 

-2 1.0 3.22(10)~ 3 4.51(10)~ 5 5.22(10) 

-1 - 1.0 1.42(10)~ 3 6.91(10) 

- - 1.0 0.91 

1 - - - 1.0 



The percolation statistics were run on the ten realiza- 
tions at each evolutionary stage. Then the averages and the 1 
a deviations were computed. The results for the percolation 
statistics are plotted in figure 4. To make the comparisons 
more qualitative we compute the Kolmogorov-Smirnov (KS) 
statistic and significance level (Press, et al 1992) between 
each of the curves plotted in figure 4. These are presented 
in tables 1 & 2. 



5 CONCLUSIONS 

In this paper we have shown that one can generate the stan- 
dard percolation statistics from the Minimal Spanning Tree. 
This allows us a large increase in the speed with which we 
can perform a percolation analysis of a point dataset. Our 
calculations indicate that we can gain as much as a factor of 
10 in the computer time needed to perform the data analy- 
sis. This will become increasingly important as large redshift 
surveys become available. 

In addition we argue, based upon Poisson distributions, 
that the percolation method is a robust statistical method 
when the apropriate statistic is used. Past studies have ar- 
gued that the percolation threshold as determined from the 
linear extent of the percolating cluster is not a robust mea- 
sure of percolation (Dekel & West 1985). We confirm that 
result. Contrary to the conclusions of Dekel & West by con- 
sidering the behavior of the entire curve rather than focusing 
on a particular parameter of that curve we find that a more 
robust estimate of the percolation properties is possible. The 
mass of the percolating cluster appears to be very robust 
with respect to sampling, as opposed to the linear extent of 
the cluster which is relatively poorly behaved. This is not 
unexpected as discussed in the text above. Based upon this 
statistic the percolation threshold can be reliably estimated 
even when the particle density varies by large factors. 

We conclude by applying these percolation statistics to 
4 JV-body models with different scale-free Gaussian initial 
conditions. Based upon our comparisons of the curves in 
figure 4 using the KS test (see tables 1 & 2) it is clear that 
with the exception of the n — and n = 1 models the perco- 
lation statistics can easily distinguish between the models. 
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Both percolation statistics considered here are able to distin- 
guish between models equally well (recall a small significance 
level indicates that the two distributions are not consistent 
with the same parent distribution) , but it is only the mass of 
the percolating cluster which is strongly robust to changes 
in particle density. Thus we conclude that percolation may 
be a sensitive discriminator between cosmological models if 
clustering is not too hierarchical. 
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Figure 4. The percolation statistics applied to a 32 3 subset of particles from a 128 3 N-body simulation. The 3 statistics run horizontally 
while the 4 different initial conditions run vertically. The dimcnsionlcss neighborhood radius is defined as Z/n -1 / 3 , where n is the particle 
density. 



